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Estimator Selection: End-Performance Metric Aspects* 
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Abstract —Recently, a framework for application-oriented 
optimal experiment design has been introduced. In this context, 
the distance of the estimated system from the true one is 
measured in terms of a particular end-performance metric. 
This treatment leads to superior unknown system estimates to 
classical experiment designs based on usual pointwise functional 
distances of the estimated system from the true one. The 
separation of the system estimator from the experiment design 
is done within this new framework by choosing and fixing 
the estimation method to either a maximum likelihood (ML) 
approach or a Bayesian estimator such as the minimum mean 
square error (MMSE). Since the MMSE estimator delivers a 
system estimate with lower mean square error (MSE) than 
the ML estimator for finite-length experiments, it is usu¬ 
ally considered the best choice in practice in signal process¬ 
ing and control applications. Within the application-oriented 
framework a related meaningful question is: Are there end- 
performance metrics for which the ML estimator outperforms 
the MMSE when the experiment is finite-length? In this paper, 
we affirmatively answer this question based on a simple linear 
Gaussian regression example. 

I. Introduction 

A basic subproblem in the context of system identification 
is that of experiment design. Overviews of this topic over the 
last decade can be found in [5], [7], [15], [ 8 ]. Contributions 
include convexification [10], robust design [13], [16], least- 
costly design [3], and closed vs open loop experiments [1]. 

Recently, a new framework for performing experiment 
design has been introduced. This framework is termed 
application-oriented experiment design and it has been out¬ 
lined in [ 8 ]. Specific investigations related to communication 
systems were performed in [11], [12], Denoting the end- 
performance metric by J and assuming that J depends on the 
true and the estimated models, the performance is considered 
to be acceptable if J < I /7 for some parameter 7 , which 
we call accuracy. This motivates the introduction of a set of 
admissible models £ a dm = {G : J < I/ 7 }, where G denotes 
the model to be inferred. With these definitions, the least- 
costly experiment is formulated as follows: 

min Experimental effort 

Experiment ^ 

S.t. G G £adm 

where G is the estimated model. For the experimental effort, 
different measures commonly used are input or output power. 
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and experimental length. For G, standard maximum likeli¬ 
hood (ML) and Bayesian estimation methods, e.g., minimum 
mean square error (MMSE), are usually employed. 

Optimizing the experiment and optimally choosing the 
system estimator are two problems that should ultimately 
be tackled in a joint context. Nevertheless, both in the 
framework of classical and application-oriented experiment 
designs, a separation strategy is applied: initially, we select 
and fix the system estimator to a choice that is known to 
possess some optimality aspects, e.g., the ML or MMSE 
estimators, and then we are optimizing the experiment. 
For finite-length experiments the MMSE estimator is often 
considered to be superior to the ML estimator. A related 
meaningful question in the application-oriented framework 
is: Are there end-performance metrics for which the ML 
estimator outperforms the MMSE when the experiment is 
finite-length? 

In this paper, we affirmatively answer the last question 
based on a simple linear Gaussian regression model that is 
used here as the simplest possible example to provide the 
necessary answer. The reason for choosing this example is 
two-fold: except for the simplicity that it allows, it neutralizes 
the choice of the optimal experiment. Via this example, 
we re-examine the validity of the common belief that the 
MMSE estimator is superior to the ML estimator, when 
finite length experiments are used to identify the unknown 
system. To this end, appropriate mean square error (MSE)- 
like end-performance metrics are used that are meaningful 
is certain applications such as in communication and control 
systems. Finally, we numerically demonstrate the validity of 
the claims verifying the purchased analysis. 

This paper is organized as follows: Section QI] defines 
the problem of designing the system estimator with respect 
to the end performance metric. Section [HI] presents some 
results and comments that will be useful in the rest of the 
paper, while it introduces approximations of the performance 
metrics that the rest of the analysis will be based on. The 
optimality of the ML and MMSE system estimators with 
respect to the minimization of the aforementioned MSE- 
like end-performance metrics is examined in Section [TV] 
Section [V] illustrates the validity of the derived results. 
Finally, Section [VI] concludes the paper. 

Notations : Vectors are denoted by bold letters. Super¬ 
scripts T and H stand for transposition and Hermitian 
transposition, respectively. | - | is the complex modulus. For 
a vector a, a(m) denotes its m-tli entry. The expectation 
operator is denoted by £’(■). Finally, CM (p. <j 2 ) denotes the 
complex Gaussian distribution with mean p, and variance ct 2 . 


II. Problem Statement 
Consider the scalar linear Gaussian model 

y(n) = 9u(n) + e(n), (2) 

where yin) is the observed signal at time instant n, 6 is 
the unknown system parameter assumed to be complex¬ 
valued, u(n) is the input at the same time instant and e(n) 
is complex, circularly symmetric, Gaussian noise with zero 
mean and variance a 2 . We further assume that E[u(n)} = 0 
and £?[|u(n)| 2 ] = a 2 . In addition, w(n) and u(n) are 
independent random sequences, while e{n) is a white random 
sequence. 

Assume that the experimental length is limited to N 
time slots and that the maximum allowed input energy for 
experimental purposes is 8. We can collect the received 
samples corresponding to the experiment in one vector: 


9 exp @ U exp ®expi (^) 

where y exp = [y(l - N + 1), y(l - N + 2), • • • , y(l)f is 
the vector of N received samples corresponding to the ex¬ 
periment, it exp = [u(l — N + 1), u{l — N + 2), • • • , u(l)) T 
is the vector of N input symbols and e = 

[e(l — N + 1), e(l— N + 2), • • • , e(l)] T is the vector of N 
noise samples. Considering the class of linear parameter 
estimators, the system is estimated as follows: 

0 = f H Vex p = 0/ ff «ex P + / H e exp , (4) 


where / is a N x 1 estimating filter. 

A possible performance metric is the MSE of a linear 
input estimator. The input estimator uses the system knowl¬ 
edge and delivers an estimate of the input variable. We 
call clairvoyant the input estimator that has perfect system 
knowledge. Denoting the corresponding estimating filter by 
6(9), we can find its mathematical expression as follows: 


6(9) = are: min E 

c(0) 


\c(9)y(n) - u(n)\ 2 , 


(5) 


where the expectation is taken over the statistics of u(n) 
and e(n). If we set the derivative of the last expression with 
respect to c(9) to zero and we solve for c(O'), then the optimal 
clairvoyant input estimating filter is given by the expression 


6 ( 9 ) = 


° 2 J* 

W°l+vl' 


( 6 ) 


We will call this the MMSE clairvoyant input estimatoiQ. We 
observe that as the signal-to-noise ratio (SNR) increases, i.e., 
cr 2 —y 0, 6(9) —► 1/9. We call 6(9) = 1/9 the Zero Forcing 
(ZF) clairvoyant input estimator. Due to this last convergence 
and for simplicity purposes, we focus only on the ZF input 
estimator in the sequel. 

We can now introduce an end-performance metric of 
interest, which will be used in the following analysis. Given 
an input estimator, we define the excess of the input estimate 


The multiplication by y(n) is considered implicit. 


based on an input estimator that only knows a system esti¬ 
mate over the input estimator with perfect system knowledge, 
thus leading to 


MSEga: = E 


c(9)y(n) 


c(9)y(n) 



(7) 


In the sequel, this metric will be called excess MSE. 

Our goal will be to determine the optimal parameter 
estimators for fixed experiments of finite length so that 
MSE ex based on the ZF input estimator is minimized. To 
this end, the following section presents some useful ideas. 


III. Preliminary Results 


Consider the ML estimator. For the linear Gaussian regres¬ 
sion, this estimator coincides with the minimum variance un¬ 
biased (MVU) estimator. We therefore replace our references 
to the ML estimator by references to the MVU estimator 
from now on. Since the MVU is an unbiased estimator, it 
satisfies f H u exp = 1. This condition implies that E[9\ = 9. 
For our problem assumptions, the MVU estimator can be 
found by solving the following optimization problem: 

mincr 2 ||/|| 2 

s.t. f H u ex p = 1. (8) 


Forming the Lagrangian for this problem and zeroing its 
gradient with respect to /, we get: 


f MVU — 



(9) 


If we assume that 9 is a random variable and that its prior 
distribution is known, then instead of the MVU one could 
use the MMSE parameter estimator. With our assumptions 
and the extra assumption that E[9\ = 0, one can obtain [14] 


f MMSE — 


E[\9\*]u e 


m 2 )\K 


+ aj 


( 10 ) 


Assuming that 9 is a deterministic but unknown variable, 
the MSE ex of the ZF input estimator can be easily obtained: 


MSE^ (ZF) = E 



(ID 


(c.f. 0). Here, the superscript “d” stands for “deterministic”. 
If 9 is assumed to be a random variable, then the correspond¬ 
ing end-performance metric MSE^. is obtained by averaging 
the last expression over 9. 

Depending on the probability distributions of \9\ and \9\, 
the above MSE expressions may fail to exist. The MSEs 
will be finite if the probability distribution function (pdf) 
of \9\ is of order 0(\9\ 2 ) as 9 —> 0. A similar condition 
should hold for the pdf of \0\ in the case of MSL(,. In the 
opposite case, we end up with an infinite moment problem. 
In order to obtain well-behaved parameter estimators that 
will be used in conjunction with the actual performance 
metric, some sort of regularization is needed. Some ideas 
for appropriate regularization techniques to use may be 
















obtained by modifying robust estimators (against heavy¬ 
tailed distributions), e.g., by trimming a standard estimator, 
if it gives a value very close to zero [9]. An example of such 
a trimmed estimator is given as follows: 


f f H y e *p, if l/ H 2/expl > A 

1 A/^xp/l/^yexpl, °' W 


where f can be any estimator and A a regularization param- 

etei@. 

Remark: Clearly, the reader may observe that the definition 
of the trimmed 9 preserves the continuity at \ f H y exp \ = A. 
Additionally, the event {f H y = 0} has zero probability 

TT ^ 

since the distribution of / y exp is continuous. Therefore, in 
this case 9 can be arbitrarily defined, e.g., 9 = A. 

Assume a fixed A. Then, for a sufficiently small A 
and a sufficiently high SNR during training, minimizing 
MSEj^ZF) is approximately equivalent to minimizing the 
approximation 


E 


MSEg X (ZF) 


0 — 9 


E 


w '■ (13) 


as we show in the appendix. Using some minor additional 
technicalities, we can work with 


[MSE^ (ZF)] 0 = 


°lE e 

\9\ 2 E 

9-6 

2 " 

+ cr 2 Eg 

E 

9-9 

2 1 

E e 

\9\ 2 E 

\0\ 2 



instead of MSE^ (ZF). We call the last approximations 
zeroth order input estimate excess MSEs. The following 
analysis and results will be based on the zeroth order metrics 
and they will reveal the dependency of the system estimator’s 
selection on the considered (any) end- performance metric. 

Remarks: 

1) A useful, alternative way to consider the zeroth order 
MSEs is to view them as affine versions of normalized 
parameter MSEs, where the actual true parameter is 9 
and the estimator is 9. 

2) In the definition of (fl3l> . one can observe that after 
approximating the mean value of the ratio by the ratio 
of the mean values the infinite moment problem is 
eliminated. In the following, all zeroth order metrics 
will be defined based on the non-trimmed 9 to ease 
the derivations. This treatment is approximately valid 
when A is sufficiently small. 


A. ZF Input Estimator with a Deterministic System 

The expectation operators in Eq. (fl~3l> are with respect to 
e exp > u i n ) and e(n). In this case, we have: 

l^|/^ p -l| 2 ^e 2 Ikf ( 2 

° m/^p| 2 + -e 2 ll/ll 2 V“ + |0| 2 

(15) 

The numerator of the gradient of the above expression with 
respect tc0 / is given by the following expression: 


MSEf x (ZF) 


[|0| 2 M 2 + ^ll/f] [l^l 2 (<P - 1)* «exp + 

- [l^lV^exp + CT 2 /] [l^l 2 \<P~ !| 2 + Cr 2 |l/H 2 


(16) 


where ip = f H u exp . Setting / = / MVU > one can easily 
check that the above expression becomes zero. Therefore: 
Proposition 1: The MVU is an optimal system estimator 


for the task of minimizing 


MSEg X (ZF) 


0 


when the sys¬ 


tem parameter is considered a deterministic but otherwise 
unknown quantity. 


Remark: Note that even if 


MSEg X (ZF) 


depends on the 


unknown system parameter 9, the optimal system estimator 
does not in this case. 


B. ZF Input Estimator with a Random System 

In this case, the prior statistics of 9 are known. The zeroth 
order excess MSE is given by: 


[MSEl x (ZF)} 0 


\p-l\ 2 (E[\9\ 


41 a 2 


E[W}al) 


+ 


e[WM 2 

ii/ii 2 wiv; 


ii/ii 

21^2 


E[\0\ 2 ~. 

+ ^e) 


E[m^ + ^\\f\\ 2 E[\ 9 \^ 


(17) 


Differentiating this expression w.r.t. / and setting / = 
f mvu we zero gradient. Therefore: 

Proposition 2: The MVU is an optimal system estimator 
for the task of minimizing [MSEg X (ZF')] 0 , when the system 
parameter is considered random. 

Via tedious calculations, we can show that the MMSE 
channel estimator does not zero the gradient. 

Remark: This result is counterintuitive: it says that when 
one has knowledge of the system statistics but uses a 
ZF input estimator, one should ignore these statistics in 
choosing a system estimator for minimizing the zeroth order 
excess MSE. This is the major result in this paper: The 
belief that combining the MMSE system estimator with any 
performance metric is better than using the MVU/ML system 
estimator when finite length experiments are used to identify 
the system, is not valid. 


IV. Minimizing the Zeroth Order Excess MSE 

In this section, we investigate the selection of the system 
estimator for the zeroth order excess MSE in the case of the 
ZF input estimator. 

-This parameter can be tuned via cross-validation or any other technique, 
although in the simulation section we empirically select it for simplicity 
purposes. 


C. Discussion on the Optimal Training 

Since the system estimator is selected in order to optimize 
the final performance metric, one may consider the problem 
of selecting optimally the input vector it exp under a max¬ 
imum energy constraint ||« exp || 2 < £ to serve the same 

'discarding the positive scalars and considering again the corresponding 
(hermitian) transpositions. 









































purpose. To optimize the input vector, one should first fix 
the system estimator. This is a “complementary” problem 
with respect to the approach that we have followed so far. 
Suppose that we use either the MVU or the MMSE system 
estimators. One can observe that for N = 1 the problem of 
selecting optimally the input vector is meaningless. In the 
case that N > 1, fixing for example f = / MVU one can 
observe that again the problem of selecting optimally the 
input vector is meaningless. Consider for example the case 
of [MSE^ (ZF)] 0 . We then have: 


[MSEL (ZF)] 0 = 


<?e ( E [\°\ 2 }v 2 u+°l) 


e [ mn«, 


exp l 


°\E[ W 


which only depends on j|tt exp || J . Furthermore, 
[MSE^ (ZF)] 0 is minimized when ||u, exp || 2 = Z, which is 
intuitively appealing. Therefore, any it exp with energy equal 
to Z is an equally good input vector for the MVU estimator. 
Thus, for the same u exp , the MVU estimator is better than 
the MMSE. 


V. Simulations 

In this section we present numerical results to verify our 
analysis. In all figures, 9 ~ CAf(0 ,1). The SNR during 
the experiment highlights how good the system estimate is. 
The parameter A has been empirically selected to be 0.1 in 
Fig. [2] The two figures that we present in this section aim 
at two goals: first, to highlight that indeed the MVU/ML 
estimator can be better than the MMSE in finite length 
system identification depending on the end-performance met¬ 
ric of interest. And second, to verify that the zeroth order 
approximations used in this paper for analysis purposes are 
good approximations to the true end-performance metrics for 
extracting the necessary conclusions. 

Fig. □ presents the corresponding results for 
[MSEg X (ZF)] 0 . The SNR during the experiment has 
been set to 0 dB, which can be a low operational value 
in real world appplications, but useful, e.g., in situations 
where energy efficiency is crucial such as in wireless sensor 
networks. The experimental length has been set to 2 simply 
to eliminate the asymptotic efficiency of the ML estimator. 
The MVU is the best estimator as proven. This is an 
example contradicting what one would expect and verifying 
the motivation of this paper. 

Finally, Fig. |2] verifies that the zeroth order metrics used 
in this paper are good approximations in terms of indicating 
the structure of uniformly better estimators than the MMSE. 
The SNR during the experiment and the experimental length 
are as before. We observe that except for a translation in the 
vertical direction, the zeroth order approximations are able to 
indicate the relative position of the estimating curves leading 
to accurate conclusions about the comparison between them. 

VI. Conclusions 

In this paper, end-performance metric system estimator 
selection has been investigated. We have shown that the 
application-oriented selection is the right way to choose esti¬ 
mators in practice. We have verified this observation based on 


SNR =0 dB, N=2 
exp 



Fig. 1. [MSEJ X (ZF)] 0 with SNR during the experiment equal to 0 dB 
and N = 2. 



Fig. 2. MSEgj.fZF) with SNR during the experiment equal to 0 dB, 
N = 2 and A = 0.1. 


an explanatory end-performance metric of interest, namely, 
the excess input estimate MSE. The extracted conclusion is 
that the ML/MVU estimators can be better than the MMSE 
estimator for particular end-performance metrics of interest. 
This invalidates the common belief that the MMSE estimator 
is always better than the ML/MVU estimators for any end- 
performance metric, if finite length experiments are used for 
system identification purposes. 


Appendix 


This 

MSE d ex 

with a 

MSE^ 


section 


(ZF) 

fixec 

(ZF)' 


proposes a simplification of the 
metric for the estimator given in (IT2l) 
A. Due to the Gaussianity of n exp , 
= oo for any f ^ 0 (infinite moment 
















problem). Using (IT2l) . the corresponding metric becomes: 


MSE“ 


(ZF)] reg = Pr{|/ ff y exp |>A} 

2 


E 


+Pr 


E 


<7,1. + 


<7e 

\o \ 2 


Since X —> E[X] and Y —► E\Y) in the mean square 
sense, E[X 2 } -»• E 2 [X], E[Y 2 ] -> E 2 |Y] and E[A'F] -»• 
E[X\E\Y\. For the last case, notice that 


1 - 


f H y 


exp 


; \f H y exp \ > a 


{l/ J 

fl 

A 2 


y e x P l < a 


A 2 |0| 5 


pH 

f y e 


\f y, 


- e 


exp I 


; I f H y, 


< A 


where 
of the 
Pr 


{l f H v 


; denotes conditioning and “reg” signifies the use 
regularized system estimator in Moreover, 


| E[XY] - E[X]E[Y] | <^E\\X- E[X] |/| E \ \Y - E[Y]\ 

where the last inequality follows again from the Cauchy- 
Schwarz inequality. By the mean square convergence of X to 
E[X] and Y to E[Y] the right hand side of the last inequality 
tends to 0. Therefore, the right hand side of (EH tends to 0. 
Moreover, under the high SNR assumption the conditional 
( 18 ) expectations can be approximated by their unconditional 
ones, since for a sufficiently small A their difference is due 
to an event of probability 0(A 2 ). Therefore, 


exp I — ^ I 


= 0(A 2 ), since by the mean value 
theorem this probability is equal to the area of the region 
{|/ H y e x P l < A}, which is of order 0(A 2 ), multiplied by 
some value of the probability density function of | f H y, 
in that region, which is of order 0(1). In addition, 

2 


/exp I 


E 


A 2 


A 2 |0| 5 


f H 

f y , 


exp 


I / 


f/ 


- 9 




A 2 


| 0| 5 


?/exp 

fi 

A 2 


; I f H y, 


< A 


-2 


A 


A|0| : 


E 


)Rle* 


f H y 


exp 


I f H y, 


exp I 


Furthermore, if the SNR during training is sufficiently high 


and the probability mass of | f y exp | is concentrated around 
\9\, then it can be shown that 


E 




| 0| 2 


1 - 


pH n 

f y e 


( a u + a e/\@\ 2 )E[\f M y e xp ~ 


; I f H y, 
I 2 ; I f H 


exp I ^ ^ 
exp I ^ 


y, 


E \\f y e x P l 2 ;l/ y exp l>A] 


09) 

The same holds even if f H y exp is a biased estimator of 9 at 
high training SNR and \ f H y exp \ tends to concentrate around 
a value (3 bounded away from \6\ (and of course from 0). 

To show the last claim, we set X = \ f H y exp — 9 1 2 and 
Y = \f H y exp \ 2 - Since Y > A 2 , it also holds that E[Y] > 
A 2 . Furthermore, it can be seen that 


E 

X 

E[X] 


Y 

E[Y} 


<±E[\XE[Y]-YE[X}\]. 


( 20 ) 


At high training SNR, X E[X] and Y -s- E[Y] in the 
mean square sense and therefore it can be easily shown that 
the right hand side of (E0l > converges to 0. To see this, notice 
that the Cauchy-Schwarz inequality yields 

^E[\XE[Y\ - YE[X] |] < ^ (e [| XE[Y] - YE[X] | 5 


E 




1 - 


f H y, 


exp 


I; I f H y, 


> A 


(a 2 u + a 2 /\9\ 2 )E[\f H y exp - 


exp 


E[\f 


H y e * pi 2 ] 



0(A 2 ). (22) 


Combining all the above results yields 

f ( a 2 u + a 2 /\9\ 2 )E[\f H 


MSE^(ZF) 


y exp - 


E[\f H y 



+ 0 ( 1 ). 


The 0(1) term is not negligible but for sufficiently small A 
its dependence on / is insignificant. Hence, for a sufficiently 
small A and a sufficiently high SNR during training, mini¬ 
mizing MSEg X (ZF) is equivalent to minimizing (ITTI) . 

L J reg 
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